Improving Deep Learning by Inverse Square Root Linear Units (ISRLUs)

نویسندگان

  • Brad Carlile
  • Guy Delamarter
  • Paul Kinney
  • Akiko Marti
  • Brian Whitney
چکیده

We introduce the “inverse square root linear unit” (ISRLU) to speed up learning in deep neural networks. ISRLU has better performance than ELU but has many of the same benefits. ISRLU and ELU have similar curves and characteristics. Both have negative values, allowing them to push mean unit activation closer to zero, and bring the normal gradient closer to the unit natural gradient, ensuring a noiserobust deactivation state, lessening the over fitting risk. The significant performance advantage of ISRLU on traditional CPUs also carry over to more efficient HW implementations on HW/SW codesign for CNNs/RNNs. In experiments with TensorFlow, ISRLU leads to faster learning and better generalization than ReLU on CNNs. This work also suggests a computationally efficient variant called the “inverse square root unit” (ISRU) which can be used for RNNs. Many RNNs use either long short-term memory (LSTM) and gated recurrent units (GRU) which are implemented with tanh and sigmoid activation functions. ISRU has less computational complexity but still has a similar curve to tanh and sigmoid.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multiplier-less and Table-less Linear Approximation for Square-Related Functions

Square-related functions such as square, inverse square, square-root and inverse square-root operations are widely used in digital signal processing and digital communication algorithms, and their efficient realizations are commonly required to reduce the hardware complexity. In the implementation point of view, approximate realizations are often desired if they do not degrade performance signi...

متن کامل

Neural Network Learning as an Inverse Problem

Capability of generalization in learning of neural networks from examples can be modelled using regularization, which has been developed as a tool for improving stability of solutions of inverse problems. Such problems are typically described by integral operators. It is shown that learning from examples can be reformulated as an inverse problem defined by an evaluation operator. This reformula...

متن کامل

L1-Norm Batch Normalization for Efficient Training of Deep Neural Networks

Batch Normalization (BN) has been proven to be quite effective at accelerating and improving the training of deep neural networks (DNNs). However, BN brings additional computation, consumes more memory and generally slows down the training process by a large margin, which aggravates the training effort. Furthermore, the nonlinear square and root operations in BN also impede the low bit-width qu...

متن کامل

Improving Deep Neural Networks with Probabilistic Maxout Units

We present a probabilistic variant of the recently introduced maxout unit. The success of deep neural networks utilizing maxout can partly be attributed to favorable performance under dropout, when compared to rectified linear units. It however also depends on the fact that each maxout unit performs a pooling operation over a group of linear transformations and is thus partially invariant to ch...

متن کامل

Improving the accuracy of the fast inverse square root algorithm

We present improved algorithms for fast calculation of the inverse square root for single-precision floating-point numbers. The algorithms are much more accurate than the famous fast inverse square root algorithm and have the same or similar computational cost. The main idea of our work consists in modifying the NewtonRaphson method and demanding that the maximal error is as small as possible. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1710.09967  شماره 

صفحات  -

تاریخ انتشار 2017